Skip to content

wasiahmad/SumGenToBT

Repository files navigation

Summarize and Generate to Back-translate

Official code release of our work, Summarize and Generate to Back-translate: Unsupervised Translation of Programming Languages.

SetupTrainEvaluationLicenseCitation

Setup

Setting up a conda environment is recommended to run experiments. We assume anaconda is installed. The additional requirements (noted in requirements.txt) can be installed by running the following script:

bash install_env.sh

Then build tree_sitter library for Java and Python languages by running:

python build.py

Finally, download the pre-trained PLBART checkpoints.

cd plbart
bash download.sh

There are two model sizes, so we can perform experiments with MODEL_SIZE=base|large.

Train

Step1. Summarization and Generation

cd sumgen
bash run.sh GPU_ID [MODEL_SIZE]

Step2. Back-translation

cd plbart
bash train.sh GPU_ID [MODEL_SIZE]

Evaluation

Evaluate SumGen model

cd sumgen/evaluation
bash decode.sh GPU_ID SOURCE TARGET MODEL_SIZE BEAM_SIZE
bash evaluate.sh SAVE_DIR SOURCE TARGET

For example, run the following commands to get results with default settings.

cd sumgen/evaluation
# to evaluate base model
bash decode.sh 0 java python base 10
bash evaluate.sh base_java_python_b10 java python
# to evaluate large model
bash decode.sh 0 java python large 10
bash evaluate.sh large_java_python_b10 java python

Evaluate PLBART

cd scripts
bash run.sh GPU_ID

Results

License

Contents of this repository is under the MIT license. The license applies to the pre-trained and fine-tuned models as well.

Citation

If you use any of the datasets, models or code modules, please cite the following paper:

@article{ahmad2022sumgen,
  author    = {Wasi Uddin Ahmad and Saikat Chakraborty and Baishakhi Ray and Kai-Wei Chang},
  title     = {Summarize and Generate to Back-translate: Unsupervised Translation of Programming Languages},
  journal   = {CoRR},
  volume    = {abs/2205.11116},
  year      = {2022},
  url       = {https://arxiv.org/abs/2205.11116},
  eprinttype = {arXiv},
  eprint    = {2205.11116}
}

About

Official code of our work, Summarize and Generate to Back-Translate: Unsupervised Translation of Programming Languages [arXiv].

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors